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ABSTRACT 


The  general  principle  of  a  posteriori  forward  error  analysis  is 
discussed.  The  fundamental  idea  is  simply  based  on  the  fact  that  the 
difference  between  the  computed  result  of  any  of  the  basic  floatino- 
point  operations  and  the  exact  result  can  be  estimated  usino  the 
computed  result.  For  algorithms  with  finite  number  of  arithmetic 
operations,  this  idea  can  he  extended  easily  so  that  forward  error 
analysis  is  possible.  Some  results  of  certain  useful  algorithms 
are  derived  usino  this  approach. 
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1.  Introduction. 

In  recent  years  considerable  attention  has  been  given  to  the  effect  of 
rounding  errors  upon  the  numerical  solution  of  various  problems  involving 
algebraic  processes.  An  outstanding  contribution  on  this  topic  has  been 
made  by  J.H.  Wilkinson  [1,2,3].  His  backward  error  analysis  shows  that 
the  computed  results  are  the  exact  solutions  to  a  perturbed  problem  and 
the  bounds  for  the  perturbations  can  be  obtained  numerically.  In  other 
words,  if  we  are  computing  a  mathematical  expression  given  by 

y’5(*,.x2 . x„i  0.1) 

the  backward  analysis  shows  that  the  computed  y  satisfies  exactly  a  per¬ 
turbed  equation  of  the  form 

y  ■  g(x1+c1,x2+c2,...,xn+en)  (1.2) 

where  are  perturbations  whose  bounds,  in  general,  could  be  obtained. 

From  now  on  we  will  only  consider  normalized  floatina-point  computations 
with  t  bits  allocated  to  the  mantissa  of  a  floating-point  number.  In  this 
setting  the  backward  error  analysis  is  based  on  the  reoeated  use  of  the 
followino  lemma  [3]: 

Lemma  1.  Let  *  denote  any  of  the  operators  +,  -,  x»  /•  Then 

fl(x  *  y)  =  (x  *  y) (1  +  6)  | <5 1  <  2_t  =  u  (1.3) 

where  x  and  y  are  any  real  numbers  and  fl(x  *  y)  is  the  correctly  rounded 
result  of  the  floating  operation  *. 
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We  observe  that  Lemma  1  Is  Indeed  Itself  the  result  of  backward  error 
analysis.  It  expresses  the  result  of  any  floating-point  operation  as  the 
result  of  an  exact  arliimotlc  operation  on  slightly  perturbed  data  and  the 
bound  for  the  perturbation  is  known.  Therefore,  If  the  x^  are  not  known 
exactly,  the  backward  analysis  will  enable  us  to  decide  whether  the  solu¬ 
tion  obtained  numerically  Is  as  good  as  the  original  data  warrants  by  com¬ 
paring  the  bounds  for  the  with  the  known  errors  In  x^ ;  however.  It  does 
not  tell  us  how  much  Is  the  difference  between  the  computed  y  and  the 
exact  y  if  the  original  set  of  data  Is  regarded  as  exact.  This  problem 
can  be  solved  If  a  forward  error  analysis  can  be  carried  out  to  trace  the 
forward  propagation  of  Individual  rounding  errors  and  then  to  compare  the 
computed  results  with  those  which  exact  computation  would  have  produced. 
Furthermore,  this  analysis  Is  useful  only  if  the  difference  between  the 
computed  results  and  the  exact  results  Is  a  simple  function  of  the  computed 
results.  In  other  words,  a  useful  analysis  should  show  that 

y  -  y  *  e(y)  (1.4) 

where  e(y)  Is  some  simple  function  of  y.  The  letter  requirement  Is  neces¬ 
sary  to  obtain  bounds  for  the  errors  e(y). 

We  wilt  show  In  this  paper  that  this  a  posteriori  forward  error 
analysis  Is  possible  by  properly  modifying  Lemma  1  such  that  the  modified 
lemma  satisfies  the  requirement  expressed  In  equation  (1.4).  Some  common 
floating-point  operations  are  then  analyzed  and  the  results  applied  to 
some  specific  algorithms. 
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2.  The  basic  lemma. 

To  satisfy  the  requirement  stated  in  (1.4),  we  simply  move  the  factor 
(1  +  6)  in  equation  (1.3)  to  the  left-hand  side  of  the  equation,  thus  we 
have 

Lemma  2.  If  x  and  y  are  two  given  floating-point  numbers,  and  *  is 
used  to  denote  any  of  the  operators  +,  -,  x.  /•  Then 

(1  +  A>  f1(  x  *  y)  '  x  *  y,  1  +  a  *  |a!  <  2_t  =  u  (2.1) 

Hote  from  Lemma  2  the  difference  between  computed  result  fl(x  *  y)  and 
the  exact  result  x  *  y  is  (a)  fl(x  *  y)  which  is  a  function  of  the 
computed  result.  Since  most  of  the  computations  are  carried  out  by 
using  tnese  operators  sequentially,  the  error  at  each  operation  could 
tnus  be  monitored  by  the  use  of  this  lemma. 

In  the  formulation  of  Lemma  2,  we  have  treated  the  division  operator  / 
on  the  same  basis  as  that  of  addition  +,  subtraction  -,  or  multiplication  x 
namely,  division  is  regarded  as  an  independent  operation  which  Is  distinct 
from  the  additive  or  the  multiplicative  operators  +,  -,  or  x*  However, 
the  division  could  also  be  carried  out  by  considering  alternatively  the 
following  problem:  namely,  for  given  x  and  y,  an  unknown  z  is  sounht 
such  that,  without  actually  perform' nn  the  division  2L*  we  have 

yz  =  x  (2.2) 

In  this  respect  v/e  are  decomposing  x  into  a  product  yz.  Thus  the  com¬ 
putational  equation  corresponding  to  (2.2)  is 
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yz(l  +  A')  *  x 


\t'\  <_  u 


(2.3) 


which  can  only  tell  us  the  difference  between  the  computed  decomnosi tion 
yz  and  the  exact  decomposition  x.  Hence  the  algorithm  used  for  (2.2)  is 
'analytic"  in  nature.  On  the  other  hand,  if  division  is  done  to  find 
z  =  y-,  then  this  ooeration  is  "synthetic"  as  two  unknowns  x  and  y  are 
"synthesized"  to  form  z.  The  addition,  subtraction,  and  multiplication 
operations  can  all  be  considered  as  "synthetic"  in  this  respect.  We 
will  see  later  that  this  idea  can  also  be  used  to  classify  algorithms 
and  to  interpret  the  results  of  error  analysis. 
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3.  Errors  of  extended  products  and  sums. 


We  first  consider  the  extended  product  n,  defined  as 

Pn  =  fl(xrx2,...  ,xn)  (3.1) 

We  assume  henceforth  that  x^  are  floating-point  numbers  and  that  operations 
take  place  in  the  order  in  which  they  are  written.  We  use  the  followino 
recursive  algorithm  to  evaluate  (3.1): 

Pi  =  x, 

Pk+1  =  fl(pkxk+1)  k  =  1 .2 . n_! .  (3.2) 

Wow  applyino  Lemma  2  to  (3.2),  we  have 
Pi  =  x, 

pkLl^  +  ck+l  ^  =  pkxk+l  k  =  (3‘3) 

‘ Lk+1 ‘  -  u 

Hence  in  general  we  have 


p  5  (I  +  c  )  =  0  x. 

1-2  1  i=l  1 


(3.4) 


It  can  be  shown  [3]  that  if  n-1  is  appreciably  smaller  that  2l ,  then 


(1  -  u) 


n-1 


5  (1+t.)  =  1  +  [  <  (1  +  u) 
i=2  1 


n-1 


(3.5) 


Thus  we  have  proved  the  following  lemma: 


Lemma  3-  For  an  extended  product  of  n  floati no-point  numbers  defined 
in  (3.1),  we  have 

pn(1  +  E)  =  xi  (3.6) 

1  =  1 

Where  1  +  E  satisfies  (3.5). 

For  an  extended  sum  defined  as 
n 

s„  *  x(),  (3.7) 

we  can  similarly  define  the  recursion 
S1  =  X1 

sk+l  =  fl(sk  +  xk+l}  k  =  n-1  (3.3) 

to  carry  out  the  computation.  Applying  Lemma  2  repeatedly  to  (3.C),  we 
have  the  following  lemma- 


f 


We  see  that  the  errors  generated  in  the  computation  of  extended 
r  oduct  and  that  of  extended  sum  can  each  be  estimated  after  the  com¬ 
putation  by  using  Lemmas  3  and  4  respectively.  Observe  that  in  extended 
product  the  upper  bound  for  the  relative  error  term  E  is  independent  of 
the  computation  order  as  is  shown  in  (3.5).  On  the  other  hand,  the 
atsolute  error  in  the  evaluation  of  extended  sum  does  depend  on  the 
order  they  are  added.  If  all  x.  are  of  the  same  sign,  then  from  (3.10) 
the  upper  bound  for  the  absolute  error  |ej  is  smallest  if  the  terms  are 
added  in  order  of  increasing  magnitude.  Furthermore,  if  the  x^  are  of 
different  signs,  then  it  is  also  advisable  to  prearrange  xi  such  that 
they  are  in  the  order  of  increasing  magnitude  with  alternate  signs. 
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4.  Applications 


Apply i ng  the  previous  lemmas  to  the  analysis  of  alaorithms  for  inner 
product  evaluation,  polynominal  evaluation  and  matrix  decomposition,  we 
have  the  following  results: 

Theorem  1  (  Inner  Product  Evaluation).  Let  the  inner  product  defined 


as 

n 

t  »  fl(  i  a.b.) 

1-1  1  1 

by  the  following  algorithm: 

(4.1) 

be  computed 

Step  A. 

Compute 

ci  =  fl  (a1-bi )  i  =  1,2,...  ,n. 

(4.2) 

Step  B. 

Compute 

tR+1  =  fl(tk  +  ck+1)  k  =  1 ,2,. . .  ,n.  (4.3) 


Then  we  have 


t  + 


n 

r 

i  =  l 


aibi 


where 


n  n 

iJ  <  u  [  t  i c. !  +  z  1 1 . [  ] 

1  '  i=1  '  i=z  1 


(4.4) 


(4.3) 
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Proof.  Applying  Lemma  2  to  (4.2)  and  (4.3),  we  have 


c.(l  +  Aj)  =  a.b.  | a |  <_  u,  i  =  l,2,...,n 


(4.6) 


tk+l ^  +  5k+l ^  =  lk  +  ck+l  ’  1 6k+l *  -  u*  k  =  1 ,2,. . .  ,n-l . 

(4.7) 


t  =  t. 


Adding  (4.6)  and  (4.7)  for  all  i  and  k  respectively,  we  have 


n 


n 


n 


c.  +  c1a1  =  z  a.b. 
1=1  1  1=1  1  1  1=1  1  1 


(4.8) 


t  + 


n-1 

r 

k=l 


tk+l6k+l 


n 

z 

1=1 


ci 


Conbininn  (4.8)  and  (4.9),  we  have  finally 

t  *  ,:t  •  ^  a,b( 
n  n 

Where  =  I  t. j.  +  Z  C-6- 

r  1=2  1  1  1=1 


(4.9) 


(4.10) 

(4.11) 


Clearly  (4.11)  satisfies  (4.5)  and  thus  the  proof  is  now  complete. 

We  see  the  format  of  the  error  in  Theorem  1  is  similar  to  that  in 
Lemma  4.  Therefore  the  comments  rceardin^  computation  order  for  extended 
sum  are  also  applicable  in  the  present  case. 

Theorem  2.  ("ested  Polynomial  T valuation) .  To  evaluate  a  polynomial 


defined  a 


p(x)  =  f1[a0xn  +  a^0'1  +  ...  +  anl, 


(4.12) 


let  the  following  nested  algorithm  be  applied: 


bo  5  a0 


ck+1  =  fl(bkx)  k  =  0,1,2 . n-1 


(4.13) 


Vi  ’  f:(ck.i  *  Vi> 


p(x)  =  b 


Then  we  have 


Cl'x 


x  a.x 
i=0  1 


(4.14) 


where 


i  i  u  [  ,:bi  i  +  ! ci  j  ],  i  =  1 ,2,. . .  ,n 


(4.15) 


Proof.  Applying  Lemma  2  to  (4.13),  we  have 


b0  "  a0 


ck+l  "  t;kX  '  Ck+1  k+1  ’ 


k+1  1  - 


(4.10 


•  Vi  ‘  Vi  -  *WW  vw:  k  *  ;,J’? . 


p(x)  =  b. 


Simplifying,  wo  have 


bk+1  =  bkx  +  ak+1  '  ck+l6k+l  ‘  bk+l6>k+l  k=0,1 ,2,‘ *  * ,n"1 

(4.17) 

p(x)  =  bn 

By  repeatedly  substituting  (4.17)  for  b^  startino  from  k  =  n-1 ,  we  have 

p(x)  +  l  t..xn-^  =  z  a-x11  ’  (4.18) 

i=1  1  i-o 

where 

ci  =  ci6i  +  V'i  (4.19) 


The  theorem  follows  from  (4.19) 

We  observe  from  (4.18)  that  the  error  term  depends  not  only  on  the 
computed  values  b^  and  ,  but  also  depends  on  the  powers  of  x,  which  are 
not  explicitly  computed  and  hence  are  unknowns.  Thus,  extra  computations 
are  needed  if  error  bound  is  to  be  estimated. 

Theorem  3.  (Unnested  Polynomial  Evaluation).  If  the  polynomial  de¬ 
fined  in  (4.12)  is  evaluated  by  the  following  algorithm: 
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Step  S.  Compute 


z 


n 


a 


n 


z . 
,1 


fl (a  .y .) 
J  J 


Step  C.  Compute 


,i  =  n-1 ,  n-2 ,  ....  0 


(4.21) 


s 


n 


z 


n 


si  =  fl^si+l  +  2i'  1  =  n"1»  n*2»  (4.22) 

o(x)  =  sQ 


then  we  have 


p(x)  + 

n-1 

T. 

i=0 

(1  +  «i )  ZiAi  + 

e  =  T.  a.xn_1 

P  i=0  1 

(4.23) 

where 

1^1  < 

u. 

(1  -  u)n_1  <  1  + 

A.  <_  (1  +  u)n  1  and 

Mi 

u 

i 

T  C  +  Is .  1 

=0  1  1 

] 

(4.24) 

The  proof  is  similar  to  ci.cn.  of  Theorem  2. 

We  should  note  that  although  an  extra  n  multiplications  are  needed 
for  this  unnested  algorithm,  we  do  have  results  which  are  useful  for 
error  estimation.  This  is  shown  in  (4.23).  Furthermore,  if  extra  com¬ 
putations  are  carried  out  to  estimate  the  error  in  (4.14)  resulting 
from  the  nested  algorithm,  then  the  unnested  and  nested  algorithm  are 
equivalent  in  terms  of  number  of  operations. 
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Theorem  4  (Matrix  Decomposition).  For  the  matrix  equation 


Rx  =  b 


(4.25) 


where  r;  =  (r.^)  is  an  n  by  n  non-singular  lower  triangular  matrix 
and  b  is  an  n- vector,  then  the  component  of  x  can  be  computed  in  the 
order  of  x^ .x^,.. . ,xn  by  using  the  followino  substitution  algorithm: 


X1  =  fHrJ 


-r..x.-r.„x0-. . .-r.  .  ,x.  ,+b. 

x  =  fi  ill  i2  2  i  ,1-1  i-1  i 


i  =  2,3,... ,n 


If  (4.26)  for  each  x.  is  computed  in  the  followino  sequence: 


Compute 


(4.26) 


y,-i,  3  fH-r..  X,  )  k  =  1,2,... , i - 1 


(4.27) 


Step  D.  Compute 


s^.  =  fl(yn  +  yi2  +  ...  +  yii)  j  =  1,2 . i-1  (4.28) 


Step  C.  Compute 


zi  *  *  bi> 


(4. 2D) 
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then  the  computed  x  satisfies 


Rx  +  n  =  b  (4.30) 

where  n  *  (n^)  Is  an  error  vector  such  that 
1*1 1  <  u*  |r„x,| 

I  n1l  <  e*  (r^x^f  +  |  ,1  -l !  *  =  2,3,. ...n  (4.31) 

and  (1  -  u)2  _<  1  +  e  £  (1  +  u)2,  je,  4_J  <.  u  [  i  |y1k|  +  z  |sik|  ] 

1,1  1  k«l  1K  k-z 

The  proof  can  be  obtained  easily  by  applying  Lemmas  2  and  4  to  equations 
(4.27),  (4.28)  and  (4.29). 

We  observe  from  (4.30)  that  If  we  are  to  find  the  error  between  the 
computed  solution  x  and  the  exact  solution  R~^b,  then  it  is  easily  seen 
that 

R_1b  -  x  =  R-1r)  (4.32) 

Thus  the  error  is  a  function  of  R"^  which,  just  like  the  powers  of  x  In 
nested  polynomial  equation,  has  never  been  explicitly  computed.  Extra 
computations  are  therefore  necessary  for  error  estimation. 

We  might  argue  that  the  results  of  Theorem  2  and  Theorem  4  do  not 
satisfy  the  requirement  of  (1.4)  for  useful  a  posteriori  error  analysis 

even  If  Lerrma  2  Is  used  consistently  in  the  analysis.  However,  we  should 
realize  that  algebraically  we  have  assumed  that  the  powers  of  x  and  R*1 
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are  implicitly  generated  by  the  algorithms  In  Theorem  2  and  Theorem  4 
respectively.  Unfortunately,  these  "efficient"  algorithms  do  not  yield 
these  necessary  data  computationally.  Therefore,  the  "Inefficient"  un¬ 
nested  algorithm  is  a  "better"  algorithm  for  polynomial  evaluation  from 
the  point  of  view  of  error  estimation.  Similarly  we  could  conjecture 
that  Cramer's  rule  might  be  a  "better"  algorithm  than  substitution 
algorithm  for  solving  triangular  system  of  equations  from  this  respect. 
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5.  Conclusions. 

We  have  seen  for  certain  algorithms  the  repeated  use  of  Lemma  2  can 
lead  to  useful  a  posteriori  results  for  error  estimations.  For  some 
"efficient"  algorithms  the  results  are  not  very  "useful".  This  is  be¬ 
cause  that  in  order  to  be  "efficient"  certain  steps  of  computations 
have  to  be  skipped  which  result  in  insufficient  data  for  error  estima¬ 
tion.  Another  explanation  is  as  follows:  The  nested  algorithm  in 

n  % 

Theorem  2  essentially  decomposes  the  original  polynomial  i  a.x 

1=0  1 

into  a  nested  product  {...[(a0x  +  a^)  x  +  a2]x  +...}  x  +  an  with  the 
assumption  that  the  powers  of  x  are  generated  implicitly;  similarly 
the  substitution  algorithm  In  Theorem  4  effectively  decomposes  b  into 
a  product  Rx-  Hence  they  are  "analytic"  In  nature.  If  we  are  asking 
only  how  good  is  the  decomposition,  then  Theorems  2  and  4  do  give  us 
"useful"  results  concerning  the  difference  between  computed  decomposition 
and  exact  decomposition.  Hence  they  are  indeed  useful  a  posteriori 
results.  On  the  other  hand,  the  algorithms  used  In  Theorem  1  and  3 
are  "synthetic"  in  nature  as  results  are  "synthesized"  step  by  step 
without  taking  "efficient"  short  cuts.  These  observations  are  con¬ 
sistent  with  the  basic  results  of  (2.3)  where  the  division  z  =  p-  Is 
considered  as  an  analytic  process  If  z  is  computed  such  that  yz  *  x 
without  actually  carrying  out  the  division.  The  distinction  between 
"analytic"  and  "synthetic"  processes  Is  therefore  essential  In  inter¬ 
preting  the  results  of  error  analysis.  Furthermore,  for  matrix  equations 
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of  the  type  Ax  *  b  where  A  is  a  general  n  by  n  non-singular  matrix,  it 
Is  well  known  that  the  closeness  of  Ax  to  b  does  not  necessarily  guaran 
tee  the  closeness  of  x  to  A’^b;  thus  it  is  questionable  that  we  should 
use  “efficient"  analytic  algorithms  instead  of  us'^ng  "inefficient"  syn¬ 
thetic  algorithms  if  ultimate  error  estimation  is  required  for  the  solu 
tion  of  these  systems. 
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